Semi-supervised Self-training for Sentence Subjectivity Classification

نویسندگان

  • Bin Wang
  • Bruce Spencer
  • Charles X. Ling
  • Harry Zhang
چکیده

Recent natural language processing (NLP) research shows that identifying and extracting subjective information from texts can benefit many NLP applications. In this paper, we address a semi-supervised learning approach, self-training, for sentence subjectivity classification. In self-training, the confidence degree that depends on the ranking of class membership probabilities is commonly used as the selection metric that ranks and selects the unlabeled instances for next training of underlying classifier. Naive Bayes (NB) is often used as the underlying classifier because its class membership probability estimates have good ranking performance. The first contribution of this paper is to study the performance of self-training using decision tree models, such as C4.5, C4.4, and naive Bayes tree (NBTree), as the underlying classifiers. The second contribution is that we propose an adapted Value Difference Metric (VDM) as the selection metric in self-training, which does not depend on class membership probabilities. Based on the Multi-Perspective Question Answering (MPQA) corpus, a set of experiments have been designed to compare the performance of self-training with different underlying classifiers using different selection metrics under various conditions. The experimental results show that the performance of self-training is improved by using VDM instead of the confidence degree, and self-training with NBTree and VDM outperforms self-training with other combinations of underlying classifiers and selection metrics. The results also show that the self-training approach can achieve comparable performance to the supervised learning models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Subjectivity Recognition on Word Senses via Semi-supervised Mincuts

We supplement WordNet entries with information on the subjectivity of its word senses. Supervised classifiers that operate on word sense definitions in the same way that text classifiers operate on web or newspaper texts need large amounts of training data. The resulting data sparseness problem is aggravated by the fact that dictionary definitions are very short. We propose a semi-supervised mi...

متن کامل

Sentence Subjectivity Detection with Weakly-Supervised Learning

This paper presents a hierarchical Bayesian model based on latent Dirichlet allocation (LDA), called subjLDA, for sentence-level subjectivity detection, which automatically identifies whether a given sentence expresses opinion or states facts. In contrast to most of the existing methods relying on either labelled corpora for classifier training or linguistic pattern extraction for subjectivity ...

متن کامل

مقایسه روش‌های مختلف یادگیری ماشین در خلاصه‌سازی استخراجی گفتار به گفتار فارسی بدون استفاده از رونوشت

In this paper, extractive speech summarization using different machine learning algorithms was investigated. The task of Speech summarization deals with extracting important and salient segments from speech in order to access, search, extract and browse speech files easier and in a less costly manner. In this paper, a new method for speech summarization without using automatic speech recognitio...

متن کامل

Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

This paper presents the results of developing subjectivity classifiers using only unannotated texts for training. The performance rivals that of previous supervised learning approaches. In addition, we advance the state of the art in objective sentence classification by learning extraction patterns associated with objectivity and creating objective classifiers that achieve substantially higher ...

متن کامل

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources

This paper introduces a method for creating a subjectivity lexicon for languages with scarce resources. The method is able to build a subjectivity lexicon by using a small seed set of subjective words, an online dictionary, and a small raw corpus, coupled with a bootstrapping process that ranks new candidate words based on a similarity measure. Experiments performed with a rule-based sentence l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008